Towards Introducing Long-term Statistics in Muse for Robust Speech Recognition
نویسندگان
چکیده
In this paper, we propose new developments of the MUltipath Stochastic Equalization techniques (MUSE). The MUSE technique is based on an enriched model of speech, composed of both a classical model of clean speech with HMM and equalization functions. This technique is able to reduce the recognition error rate due to a mismatch between the training and testing conditions. In order to track long-term variation of this mismatch, the introduction of a priori statistics on the equalization function is studied. In the case of Bias Removal, this approach has been implemented in HTK and tested on the Numbers95 database. Experiments show that the convergence of the bias computation is fast enough and limits the effect of the a priori values. However, both the fast convergence property and the proposed framework open research directions towards more complex equalization functions.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملMUSE CSP : An Extension to the Constraint Satisfaction
This paper describes an extension to the constraint satisfaction problem (CSP) called MUSE CSP (MU ltiply SEgmented C onstraint Satisfaction Problem). This extension is especially useful for those problems which segment into multiple sets of partially shared variables. Such problems arise naturally in signal processing applications including computer vision, speech processing, and handwriting r...
متن کامل